Archisegment-based letter-to-phone conversion for concatenative speech synthesis in Portuguese

نویسندگان

  • Eleonora Cavalcante Albano
  • Agnaldo Antonio Moreira
چکیده

A letter-to-phone conversion scheme is proposed for Portuguese which excludes representation of allophonic detail. Phonetically unstable segments are treated as archisegments, their articulatory weakness being analyzed in terms of feature underspecification. Besides solving classical problems of allophony and allomorphy, this analysis provides an efficient principle for building a unit inventory for concatenative speech synthesis. 1. PHONOLOGY AND LETTER-TO-PHONE CONVERSION Concatenative speech synthesis depends crucially on the adequacy of the phonological analysis underlying its unit list. In the same way as intelligibility requires concatenative units to be based on a consistent minimal set of allophones, quality requires enriching the allophone inventory. Allophony, however, is an old difficulty in phonological theory. Since the 50's the literature has been discussing whether there is a clear line separating phonemes from allophones [1], or, in today's terms, phonological from phonetic segments. Contemporary laboratory research has, in addition, uncovered many cases of allophony where the variants, rather than falling into discrete categories, vary continuously along some phonetic dimension, making it difficult to talk about segments at the phonetic level [2]. From the point of view of text-to-speech conversion these theoretical issues lead to persistent practical difficulties concerning the size of the phone inventory and the number of steps in letter-to-phone conversion. In concatenative systems, such questions may also affect the determination of unit size, since phonetically unstable allophones tend to coarticulate with more than one adjacent phone, thus requiring units larger than the diphone in order to be successfully represented. In our laboratory, we have been dealing with these problems as part of an effort to develop a high quality concatenative speech synthesis system for Brazilian Portuguese. An earlier attempt within our own research group [3] has yielded a low cost, low quality TD-PSOLA system. Since our aim is academic, not commercial, we are now laying emphasis on the kinds of improvement that may be achieved through the solution of theoretical problems in the analysis of language and speech. 2. THE PORTUGUESE PROBLEM Portuguese orthography is quasi-phonemic and thus very easy to convert to phones except in a few cases where allophony is extensive. Interestingly, such exceptions occur precisely where complexity at both the phonetic and the morphological level leads to ambiguity in phonological analysis. Take, for example, the representation of rhyme nasality, which motivates the controversy whether Portuguese has distinctive nasal vowels [4,5]. Such nasality, which tends to be rather heavy phonetically, is orthographically represented as 'm' or 'n' before a word internal consonant (e.g., samba, santa, sanca) and as a tilde diacritic on the vowel before another vowel or word finally (e.g., são, sã). This cannot be said to be a reasonable phonemic representation because /m/ and /n/ do not contrast in rhymes, a fact that orthography acknowledges by making the choice of one or the other letter dependent on the following consonant ('m' before 'p,b' and 'n' elsewhere). Nor can it be said to be a reasonable allophonic representation because the phonetic realization of nasal rhymes shows no correspondence with the diacritic/digraph distinction. Acoustic phonetic studies [6,7] have in fact shown that such rhymes may surface as nasal vowels or as nasalized vowels followed by nasal murmurs, regardless of the orthographic representation. In our laboratory, we have, moreover, observed that this murmur has a variable length, ranging from negligible to sizable. Such variation evokes the continuous segment reduction patterns discussed by Sproat and Fujimura [8] and Kohler [9], among others, being thus likely to be conditioned by higher level factors, such as speaker, speech rate, accent contour, and style. Facts such as this argue against phonetic detail as the proper level for letter-to-phone conversion. Any transcription of Portuguese aiming at distinguishing between nasal rhymes pronounced with and without a nasal murmur would probably have to deal with the interaction between segments and prosody, a question which is complex from a theoretical point of view and costly from a computational point of view. Other cases where allophones cannot be stated from simple context-sensitive rules exist in Portuguese [10] and are also attested in other languages [11]. To deal with them, letter-tophone conversion has to aim at a relatively abstract level of analysis. The ensuing question recapitulates the major concern of the phonology of the 70's [12]: how abstract should the proper level of representation be? 3. AN ARCHISEGMENTAL SOLUTION The answer we propose for Portuguese is just abstract enough to solve problems at both the phonetic and the morphological level without giving up the convenience of generating a linear string of 4th International Conference on Spoken Language Processing (ICSLP 96) Philadelphia, PA, USA October 3-6, 1996 ISCA Archive http://www.isca-speech.org/archive

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Concatenative speech synthesis for European Portuguese

This paper describes our on-going work in the area of text-tospeech synthesis, specifically on concatenative techniques. Our preliminary work consisted in investigating the current trends in concatenative synthesis and the problems that could arise when we apply the existing state-of-the art solutions to the specific case of European Portuguese. Our ultimate goal is to develop a text-to-speech ...

متن کامل

Automatic Discovery of Brazilian Portuguese Letter to Phoneme Conversion Rules through Genetic Programming

Letter to phoneme conversion is a basic step in Speech Synthesis processes. Traditionally, the activity involves the implementation of rules that define the mapping of letters into sounds. This paper presents results of the application of an evolutionary computation technique (Genetic Programming), in Brazilian Portuguese synthesis, aiming to discover automatically programs implementing specifi...

متن کامل

High-Individuality Voice Conversion Based on Concatenative Speech Synthesis

Concatenative speech synthesis is a method that can make speech sound which has naturalness and high-individuality of a speaker by introducing a large speech corpus. Based on this method, in this paper, we propose a voice conversion method whose conversion speech has high-individuality and naturalness. The authors also have two subjective evaluation experiments for evaluating individuality and ...

متن کامل

Diphone-Based Concatenative Speech Synthesis System for Mongolian

This paper describes the first Text-to-Speech (TTS) system for the Mongolian language, using the general speech synthesis architecture of Festival. The TTS is based on diphone concatenative synthesis, applying TD-PSOLA technique. The conversion process from input text into acoustic waveform is performed in a number of steps consisting of functional components. Procedures and functions for the s...

متن کامل

Concatenative Mandarin Tts Accommodating Isolated English Words

An experiment to explore the method realizing a concatenative Chinese TTS accommodating isolated English words is presented. The experiment was based on an existing concatenative Mandarin TTS system, developed in Motorola China Research Center. The experimental system employs an English word synthesizer based on the concatenation of speech segments stored in an English corpus. The original Engl...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1996